Skip to content

Conversation

@TITC
Copy link
Collaborator

@TITC TITC commented Apr 2, 2022

  1. normalization influenced by alpha.
grayscale = (data[..., 0]-data[..., 0].min()) / (data[..., 0].max()-data[..., 0].min())*255
  1. paste size does not match
padded.paste(im, (0, 0, im.size[0], im.size[1]))
  1. pad wrong pixel when text is inverted
    cause the text has inverted some times, but the padded pixel is hard code to 255
padded = Image.new('L', dims, 255)

I notice this will cause error recognition when the text's pixel is 255 and the pad pixel is also 255, then that pad part will be recognized as text.

TITC and others added 2 commits April 2, 2022 16:17
1. normalization influenced by alpha.
```python
grayscale = (data[..., 0]-data[..., 0].min()) / (data[..., 0].max()-data[..., 0].min())*255
```
2. paste size does not match
```
padded.paste(im, (0, 0, im.size[0], im.size[1]))
```
3. pad wrong pixel when text is inverted
cause the text has inverted some times, but the padded pixel is hard code to 255
```python
padded = Image.new('L', dims, 255)
```
I notice this will cause error recognition when the text's pixel is 255 and the pad pixel is also 255, then that pad part will be recognized as text.
@lukas-blecher
Copy link
Owner

Thanks for the contribution.
I have a couple of comments:

  1. The normalization influenced by the alpha channel:
    You are correct. The line before was not working quite as intended, but right now there is a bigger problem because images with black font and a transparent background are only zero in the first channel. All information is in the alpha cannel so the line you proposed would result in an error.
    With this in mind, I moved some code up to solve the issue all together. The result is an array with only one channel.
  2. I don't understand what the difference is. Can this be related to images do not match error in the pad function? #76
  3. It is quite important that the padding pixel is 255. If the text is white the image is inverted beforehand so that there is always a black font on white ground.

I have implemented these changes and pushed them to your branch (49480af). Feel free to comment.
This PR also solves the same problem as #113 so I'll close that one.

@lukas-blecher lukas-blecher mentioned this pull request Apr 3, 2022
@lukas-blecher lukas-blecher merged commit 08053ab into lukas-blecher:main Apr 3, 2022
@TITC
Copy link
Collaborator Author

TITC commented Apr 3, 2022

Thanks for your attention. @lukas-blecher

  1. The normalization influenced by the alpha channel
    for the with black font and a transparent background case, I removed the transparent channel at suport RGBA #113 (comment).
    I used the below image for the test and passed.
    MommyTalk1648698826232

  2. paste size does not match
    As the doc mentioned, Calculates the bounding box of the non-zero regions in the image. I noticed that the getbbox api returned size is smaller than im.size sometimes. unfortunately, my WSL has broken this evening, so I can't reproduce it right now.

  3. padding pixel is 255

so that there is always a black font on white ground.

Here is input image
lALPDefR35_l1-3NA2zNBmQ_1636_876

and here is the padded image, the line at the bottom doesn't exist in the original picture but shows after padded. As I think, this line is padded in the middle process.
lALPDeC24lZY92PMgM0BIA_288_128
here are more middle images, you can associate prefix names with variables.
image

if the above information is not enough, I will add more after WSL is repaired.😁


I encountered these problems at those images if you do not mind, you can test them before I repaired my WSL.

@TITC TITC changed the title fix bugs some suggestion Apr 3, 2022
@TITC TITC changed the title some suggestion some suggestions Apr 3, 2022
@lukas-blecher
Copy link
Owner

Thanks for the detailed comment.
I don't get the black line when padding the image above.
If I input the same image as you, the downsampled image in the end is the following
padded
Maybe it's some leftover code on your side?

@TITC
Copy link
Collaborator Author

TITC commented Apr 3, 2022

Thanks for your reply.

not exclusive that possibility, I will roll back to this version to control unrelated variables. It's too late in China, I will give your reply as soon as possible tomorrow.
image


BTW, could you teach me how to make such a brilliant formula recognition project from scratch, please. Any advice is welcome, I am new to this area, and want to learn from you.

@TITC
Copy link
Collaborator Author

TITC commented Apr 3, 2022

You are right.

I wrote a dozy line without any meaning

data = np.stack((grayscale, grayscale), axis=-1)

it causes rect[...,-1] not equal to 0 but it should be.
image
and then pixel inverted again.

im = Image.fromarray((255-rect[..., -1]).astype(np.uint8)).convert('L')

I think the main reason for this misjudges is that I do not really understand the main logic behind your code, could you give me any advice?

@lukas-blecher
Copy link
Owner

I'm happy to give any insight if you have an specific question.
I don't know what to to tell you regarding the main logic.
Sorry for the lack of comments in the code. I understand that it is difficult to see the purpose sometimes.

@TITC
Copy link
Collaborator Author

TITC commented Apr 4, 2022

Glad to hear your reply.

Your code is much better than my coworkers. My point is that my confusion comes from the knowledge hamper.


here are what I have

  • some knowledge learned from Andrew Ng's deep learning curriculum.
  • some working experience with NLP in Chinese.
  • kind of familiar with semantic segmentation and object detection, and both have some project experience.
  • Read some papers, like Attention Is All You Need, BERT, Unet, Mnet and so on.
  • familiar with API in TensorFlow, Keras, and PyTorch.
  • linear algebra learned from MIT Gilbert Strang from Youtube
  • Further mathematics
  • Probability and statistical

here are what I want to know

  • Could you recommend some papers related to this repo? except below the ones I am reading.
    image

  • How can I get started in this area? Any tutorial you recommend is welcome. I am totally new to formula recognition. And I do not want to be a person who just knows how to call an API. I want to go deeper.

  • How can I make a small demo by myself from scratch, any knowledge else I need to know?

@TITC
Copy link
Collaborator Author

TITC commented Apr 4, 2022

In fact, I want to make a handwriting formula recognition project, and I notice your todo list contains this part. Could there any possibilities that permit me to be a collaborator on this project and learn from you.😁

image

@TITC TITC deleted the patch-1 branch April 4, 2022 14:09
@lukas-blecher
Copy link
Owner

I sadly don't have any more papers for you. When I started this project the ViT was freshly proposed and I wanted to make a formula recognition model. I can tell you though it is helpful to have a CNN backbone in the encoder.

Regarding the handwriting project: I see you already noticed the colab notebook I linked in the README: https://colab.research.google.com/drive/1ba_qCGJl29dFQqfBjdqMik3o_EqPE4fr
There I outlined how to finetune the model on both rendered and handwritten formulas.
I just didn't have the time yet to fully train the models.

@TITC
Copy link
Collaborator Author

TITC commented Apr 4, 2022

I sadly don't have any more papers for you. When I started this project the ViT was freshly proposed and I wanted to make a formula recognition model. I can tell you though it is helpful to have a CNN backbone in the encoder.

Regarding the handwriting project: I see you already noticed the colab notebook I linked in the README: https://colab.research.google.com/drive/1ba_qCGJl29dFQqfBjdqMik3o_EqPE4fr There I outlined how to finetune the model on both rendered and handwritten formulas. I just didn't have the time yet to fully train the models.

Thanks for your advice. Look forward to your better results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants